MOAIS - 2014 - Annual activity report

MOAIS

MOAIS - 2014

Project-Team Moais

Members

Overall Objectives

Introduction

Research Program

New Software and Platforms

New Results

Scheduling semi-malleable jobs to minimize mean flow time
Elements of Design for Containers and Solutions in the LinBox Library
Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures
Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite
Sparse Polynomial Interpolation Codes and their decoding beyond half the minimal distance
A Spatiotemporal Data Aggregation Technique for Performance Analysis of Large-scale Execution Traces
Scheduling independent tasks on multi-cores with GPU accelerators
A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations
Generic Deterministic Random Number Generation in Dynamic-Multithreaded Platforms

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Publications of the year

Previous |

Home | Next next

Section: New Results

Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures

Efficient implementations of parallel applications on heterogeneous hybrid architectures require a careful balance between computations and communications with accelerator devices. Even if most of the communication time can be overlapped by computations, it is essential to reduce the total volume of communicated data. The literature therefore abounds with ad hoc methods to reach that balance, but these are architecture and application dependent. We propose [12] here a generic mechanism to automatically optimize the scheduling between CPUs and GPUs, and compare two strategies within this mechanism: the classical Heterogeneous Earliest Finish Time (HEFT) algorithm and our new, parametrized, Distributed Affinity Dual Approximation algorithm (DADA), which consists in grouping the tasks by affinity before running a fast dual approximation. We ran experiments on a heterogeneous parallel machine with twelve CPU cores and eight NVIDIA Fermi GPUs. Three standard dense linear algebra kernels from the PLASMA library have been ported on top of the XKaapi runtime system. We report their performances. It results that HEFT and DADA perform well for various experimental conditions, but that DADA performs better for larger systems and number of GPUs, and, in most cases, generates much lower data transfers than HEFT to achieve the same performance.

Previous |

Home | Next next